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ABSTRACT 

Standardized achievement tests and instructional 
accomplishment inventories involve different methodologies and cannot 
be equated by using conventional psychometric methods. Instructional 
accomplishment inventories are descriptive, and are designed to 
reflect the scope, sequence, and skills and emphasis in a particular 
instructional program. Standardized achievement tests are designed to 
discriminate Y ween students and do not represent actual 
instruction. Tii. je tests can be equated using a qualitative method 
which requires a matching of instrument structures at three levels: 
general instrument, subcategories, and items. The analysis is 
performed in sequence at each level to show the correspondence 
between skills reflected in the instrument. An example of qualitative 
equating for the Comprehensive Tests of Basic Skills and the Los 
Angeles City Schools* Survey of Essential Skills in reading and 
mathematics for grades 3 and 6 is given. Qualitative analysis may 
reveal that there is no meaningful basis for statistical equating. If 
the testing instruments do have a qualitative relationship, the 
statistical relationship between the instruments takes on a better 
informed meaning. (8S) 
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4 

ABSTRACT 

A method for qualitatively equating achievement measures is described 
and illustrated by an example applied to elementary reading and mathematics. 
The method complements and extends conventional methodology for quantitatively 
equating educational testing instruments. 
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EQUATING INSTRUCTIONAL ACCOMPLISHMENT INVENTORIES AND STANDARDIZED 
ACHIEVEMENT TESTS 

Patricia A. Mllazzo and Aaron D. Buchanan' 

Statistical procedures derived from classical psychometric theory 
and practice underlie a methodology that has been used for many years to 
equate all variants of standardized achievement t^sts (see Thorndike, 
1971)» The methodology also has been applied successfully to criterion- 
referenced achievement tests » with apprc ^iate caveats about the 
distinction between '*equi valence'* and *'cv ^^rabl 1 i ty Instructional 
accomplishment Inventories (e.g., SWRL's Proficiency Verification 
Systems, Los Angeles' Survey of Essential Skills, Sacramento's Profi-. 
ciency Survey, the District of Columbia's Competency Based Assessment), 
however, are fundamentally different from both norm- referenced and 
criterion-referenced tests In ways that make the conventional statistica 
equating information inadequate in the absence of analytic equating 
information. 

The present paper presumes the reader is familiar with standard 
statistical equating and with the general methodology underlying 
standardized achievement tests* The paper describes briefly the general 
methodology underlying instructional accomplishment inventories. It 
then outlines the relationship of both standardized achievement tests 
and instructional accomplishment inventories to instruction. With this 
information as background, the paper describes a method f^6r performing 
qualitative equating and illustrates the method with a sample analysis. 

Methodology Underlying Instructional A c complishment Inventories 

The determining difference between standardized achievement tests 
and instructional accomplishment inventories lies In the distinction 
between psychometric methodology (applicable to standardized achievement 
tests) and survey research methodology (applicable to Instructional 
accomplishment inventories). Although some pertinent principles and 
features of psychometric technology may be applied in the development 
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and use of instructional accomplishment inventories, many key aspects of 
psychomet^r I c methodology (e.g., validity, reliability, item analysis 
statistics) are not directly applicable. In place of psychometric 
methodology. Instructional accomplishment Inventories make use of 
methodology derived from social science survey research. The critical 
features of Instructional accomplishment inventories are representation 
of Instructional scope and sequence, representativeness of performance 
modes that are directly familiar to the respondent, and clarity of ques- 
tion and response. These factors derive from two logical tenets 
fundamental to survey research: ask questions that are most representa- 
tive of an area of interest, and remove as much ambiguity as possible 
from eVery question and every set of responses (Babbie, 1976; Goode & 
Hatt', 1952). An example from. survey research helps to explain the survey 
approach. If one is Interested in knowing how the 1984 Republican 
presential primary is shaping up, a survey item could look like this: 



Who would you vote for in the 1984 Republican 
presidential pr imary? 

a. Ronald Reagan 

b. George Bush 

c. Howard Baker 

d. Don't know 



For the sake of the example > pretend that 80 percent of the respondents 
choose a, 15 percent choose b, 5 percent c, and 0 percent d. From a 
survey perspective, the Item would be reviewed on the following basis: 
Are the question ancj the responses unambiguous? Is the substance of 
both the question and the responses relevant to the area of interest, 
the Republican presidential primary? Survey items are frequently refined 
if questions like these are answered with '*no.^^ However, there Is no 
need for alarm, at least not from a measurement point of view, if most 
respondents load on the ''a*' response. The Item Is intended to obtain 
descriptive information at a specific point In time, and readers may, 
or may not, decide to take some campaign action on the basis of this 
Information. 
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Frcxn a psychometric perspective, If this survey question were treated 
like a test item, the question and/or responses would most likely be 
adjusted in order to relieve the loading on the "a** response. A number 
of standard techniques are possible for distributing response choices. 
For example, an "e'' response for another likely candidate could be added; 
some ambiguity could be added by using names such as Donald Regan, by 
adding a fictitious name such as Ronald Bush* or a plausible but irrele- 
vant name such as Ted Kennedy. These kinds of adjustments should move 
response choices around somewhat, taking the load off of response '*a,'* 
By making this type of psychometric adjustment, items become better **test** 
Items, but the descriptive power of the Information is seriously reduced. 

Instructional accomplishment inventories have a similar power to 
describe performance on specific skills at specific points in the school- 
ing year. The overriding concern Is to reflect accurately the scope, 
sequence and emphasis of skills represented directly In instruction and 
practice. Information gathered from surveys has only incidental utility W 
for making discriminations among respondents; that Is not their purpose. 
The fundamental purpose of an Instructional accomplishment inventory Is 
to describe , not to discriminate. 

Given this distinction, the equating of Instructional accomplishment 
Inventories and standardized achievement tests might seem an unnecessary 
exercise. However, the matter cannot be dismissed so easily. Standard- 
ized achievement tests have come to be the standard by which the profession 
and the public Insist that Instructional programs, and eventually schools, 
be evaluated. For this reason, any alternative is obligated to justify 
itself against this standard. (E.g., "These results are all well and good, 
but how would the kids do on a standardized test?**) Until the considera- 
tions involved in equating the two types of instruments are understood, 
standardized achievement tests will continue to provide the exclusive 
gauge of instructional program effectiveness, despite their acknowledged 
deficiency for this purpose (Buros, 1977; Tyler, 1971; Nader, 1979; 
Airaslan, 1979). 
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Relationship of Standardized Achievement Tests to Instruction 

Whereas instruction on a given skill is designed to close the gaps 
ts^etween students in what they 1earn» standardized tests are designed to 
identify gaps between what students have learned, to spread students out 
relative to each other: 

'^ . • It would therefore be a mistal<e to conclude that an item 
vith a relatively small percent of pupils, say 35 percent, answer- 
ing correctly represents a sl<ill which needs immediate attention. 
This Qould be an item which represents a level of performance that 
few pupils should be expected to master (italics added)." (Houghton 
Mifflin Company, Boston: iowa Test of Basic Skills, Item Performance 
Analysis, Forms 5 and 6, Grade 6, Level 12, 9-67535, Copyright 1971.) 

When instruction has been highly successful in teaching a skill to 
nearly all students, which Is often true in the elementary grades, a 
standardize<|l test cannot align well with the skill because performance 
scores v;ou1d\be ''too high'' to discriminate between students. To obtain 
the intended discrimination, the test is made more difficult than instruc- 
tion would meruit. By adding a level of difficulty (sometimes two or three 
levels of difficulty) to test items, test scores can be made to sprfead out 
in a downward direction. Hence the tendency for these tests to unde r- 
estimate the instructional accompl i «^hments of students with the least 
instructional oppiprtuni ty or success. Similarly, when instruction is 
largely unsuccessful In teaching a skill to nearly all students, the 
standardized test dannot align well with the skill that a large majority 
of students have not learned, because performance scores will tend to 
cluster in the low ranges. In this case, widespread instructional mal- 
achievement makes it difficult to discriminate between students. A level 
or two of difficulty can be removed from test items, forcing scpres to 
spread out in an upward direction. Hence the tendency for standardized 
tests to overe st Imate the instructional accomplishments of students with 
the most instructional opportunity or success. This pt^actice occurs often 
in grades four, five and six, where the scope and substance of instruction 
is frequently very difficult. 
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Standardized tests align best with Instruction that is part iai ly 
successful 9 that teaches specific skills to Just some , but not all » 
students. When Instruction Itself tends to spread performance scores 
out from high to low on specific skills, there Is no need to tinker 
with Item difficulty, since It Is possible tc discriminate between 
students by aligning with instruction on these: skills. Hovifcver, data 
on what is taught and learned Indicate that there are a limited number 
of skills at each grade level whfch actually fit the paradigm of partial 
learning; i.e., are learned we 1 1 by some students, learned partly by 
other students, and not learned by still other students. (See Los Angeles 
Unified School District, 1979; Sacramento City Unified School District, 
1979; Buchanan 6 Milazzo, 1978.) Moreover, a close look at such skills, 
which do show differential performance scores across groups, tends to 
mitigate a good bit of alarm about low performance scores. Many of 
these skills are indirect extensions of skills learned through direct 
instruction and practice, and performance requires transferring knowledge 
about a learned skill to a new application; the skills may be embedded 
in rare, or at least unusual, performance formats; or the skills may 
involve a high degree of ambigui^ty about what is being taught and what 
performances are expected .... But, grade-by-grade, the skills do not 
often represent a large investment of Instructional time, or a high 
expectation for mastery. Other skills are ones that are taught across 
several grades and, sooner or later, they are learned by most students 
along the way. The lowest scores on these skills occur at the earliest 
grade levels, where the Instructional investment is low and the intention 
is to introduce skills which will be thoroughly taught at the next higher 
grade. Students who do best in learning such skills are not absent much; 
they pay attention in class; they do Independent seat work and homework; 
they are troubled less by problems outside the classroom; they are 
consistently high achievers; all characteristics which are not very 
surprising. 

On the other hand, when one looks at skills that have the largest 
commitment of lesson space at each grade and the greatest Impact on 
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grade-by-grade achievement, the effects of instruction tend to be much 
more common across all groups of 'Students, including students who are 
considered to be 'Mow achievers/* 



Relationship of Instructional Accomplishment Inventories to Instruction 

The model that underlies instructional accomplishment inventories 
has no pre-established requirement for Item difficulty or for the shape 
of a performance score distribution. Because the model is rigorously 
descriptive, the critical factor for survey instruments is goodness of 
fit with the scope, sequence, and emphasis of skills taught in a particu- 
lar program of| instruction. The structure and substance of an Instructional 
accomplishment invecrCoryXare formed and justified on the basis of that fit, 
Xod^endent^of the/statialtical characteristics of Items or scores. 




The sc^p^e^f^tib^tjp^e, cind format of instructional accomplishment . 
inventories are 4prived from the instructional objectives and resources 
to which a distrUt is corrvnitted. Item formats are then designed to 
reflect, as much as possible, highly familiar, representative practice 
formats from that Instruction. Weight given to the various subcategories 
in each major skill category of eath subject area is determined by the 
lesson emphasis that the skills receive in a district's program of 
instruction. 

Unlike standardized achievement tests, goodness of fit for instruc- 
tional accomplishment Inventories does not depend on how successful 
instruction has been. If substantial amounts of lesson space are 
dedicated to teaching specific skills, then the skills should be 
represented in an inventory, regardless of how this practice affects 
the overall performance score distribution. 

Description a ^d Illustration of a Method for Qualitative Equating 

r~ ^ 

The method requires a matching of the structure of the instruments 
to be equated at three levels: general instrument, subcategories, and 
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Items. The analysis is performed in sequence at each level to show the 
correspondence between the skills reflected in tf|e instruments. 

The method is most conveniently explained via an example derived 
from SWRL's collaborative effort with Los Angeles City Schools to help 
the District implement its grade-by-grade promotion policy for elementary 
school students. In this connection^ the District administers annually 
an Instruction-based accomplishment inventory (Survey of Essential Skills 
SES) to more than 300,000 elementary students. The Di str i ct' al sb wanted 
to use the SES as a part of the evaluation of its ES^A Title I program. 
Federal regulations permitted the District to use the SES for Title I 
reporting purposes, if the SES were equated, wi th a standardize^! achieve- 
ment test. 

The example presented here illustrates the qualitative equating 
that was done for the California Test of Basic Skills (CTBS) and the SES 
!n the subject areas of reading and mathematics for grades 3 and 6.\ 

Stage One; Equatjng General Categories ^ 

The first operation In this stage is strictly descriptive. It 
provides a simple listing of the querying and reporting categories* that 
are named in each instrument. Tables 1 and 2 show the structures of 
CTBS and SES for grades 3 and 6 respectively. For example,* the CTBS 
instrument that is recommended for use at grade 3 shows two general 
querying and reporting categories In reading and two in matnematics. 
The SES instrument for grade 3 shows five such categories for reading 
and eight for mathematics. 

» 

In the second operation in this stage the broad skill categories 
from each instrument are compared cind ^'matched/' The task is obviously 
easiest to accomplish where the t»>vo instruments have skills categories 
with identical titles. For example. In Tables 1 and 2, CTBS has a broad 
skill category labeled vocabulary, so does the SES; CTBS has a broad 
skill category labeled comprehension, so does the S:ES. Therefore, at 
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the grossest level of comparison, CTBS and SES reading instruments for 
these grades show at least two broad skill categories that are nominally 
matched. 

Occasionally, broad skill categories may not match exactly in name, 
but the constructs are alike. For example, if one instrufnent has the 
skill category ''word meaning,** and the second instrument has the skill 
category 'Vocabulary,'* these constructs would be considered a match. 
Similarly, the broad skill categories on one instrument may encompass 
several of the broad skill categories on a second instrument. This 
occurs in the mathematics portions of Tables 1 and 2. Although there 
may be no nominal counterpart from one Instrument to the* next, if con- 
structs are similar, "matching" can be accomplished by simply collapsing 
or unfolding the broad ski 1 1 categories on one of the instruments. For 
example, CTBS has a reporting category labeled "computation," for which 
there is no direct counterpart on the SES, However, the SES does have 
two querying and reporting categories that are clearly computation, 
"Addition and Subtractio.^ of Whole Numbers," and "Multiplication and 
Division Facts." Without creating new constructs, the two instruments 
can be linked at this level by collapsing the two SES computation cate* 
gorles, or unfolding the broader CTBS category. 

The results of this first stage of analysis will reveal the general 
relationships between the instruments, according to "structure" (the 
number and allocation of items) and to "substance" (types of broad 
skill categories assessed and reported). The intention in this earliest 
stage of comparison should be to apply a liberal metric for equating, to 
accept as much of the tot^l instruments as possible for the next level 
of the analysis. ' 

S>tage Two; Subcategori es Within General Skills Categories Which Are Matched 

The second stage of analysis focuses on those broad skills categories 
which were found to be nominally *3llke in stage one. The Intention is to 
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Table 1 

Comparison of General Skill Areas 
on CTBS-Level 1, Form S and SES Grade Three 



Ski 1 1 Areas 


CTBS 


sts 


Read ing 


85 Items 


53 Items 


Vocabulary 


X 


X 


Comprehension 


X 


X 


Decod ing 




X 


. Structural analysis 




X 


Location/study skills 




X 


Mathematics 


98 Items 


60 Items 


Addition and subtraction 






of whole numbers 


1 


X 


X 


Mul tip] ication and 




(reported as 




division facts 




computation) 


X 


Numeration 






X 


Fractional numbers 




X 


X 


Geometry 




(reported as 


X 


Measurement 




concepts and 


X 


Relat lons/funct tons/ 




appl Icatlons) 




statistics 






X 


Appl I cat ions 






X 


Total numbpr of general Skill 






Areas: 


5* 


13 



^Concepts and applications are reported as a single skill area on 
CTBS. 
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Table 2 

Comparison of General Skill Areas 
on CTBS Level 2, Form S and S'*s; Grade Six 



Ski 1 1 Areas 


CTBS 


SES 


Read i no 


85 Items 


62 1 terns 


Vocabulary 


X 


X 


CofTiDrehens 1 on 


X 


X 


Decoding 




X 


Structural analys i s 




X 


Reference/study skills 




X 


Mathematics 


98 r terns 


63 1 terns 


Addition and subtraction 






of whole numbers 


( ^ 


X 


Mill f I n1 1 rat i nn AnH divl^ton 




(reported as 




of whole numbers 




computation) 


X 


Computation with fractions 








and decimals 






X 


Numeration 






X 


Fractional numbers 




X 


X 


Geometry 




(reported as 


X 


Measurement 




concepts and 


X 


Relat ions/funct ions/s tat i sties 




appl ications) 


X 


Appi icat ions 




X 


X 


Total number of general Sl<ill 

Areas '■ 




14 



^Concepts and applications are i*»ported as a single sl<ill area 
on CTBS. 
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achieve a finer level of detail than the gross reporting categories, and 
to begin to Integrate the structures of the two instruments. 

For convenient comparison and matching, the large number of items 
in each skills category Is organized Into more homogeneous subcategories, 
which 'would be meaningful across the two Instruments at the grade level 
of concern. At this stage, the primary concern Is simply to remove a 
layer of structural corrplexity represented by the two instruments, prior 
to establishing linkages between individual items. In our example, it 
was possible to use existing Indexes In reading or mathematics which have 
enough surface detail to permit a breakdown of large skill constructs, 
such as vocabulary or computation. Into more homogeneous and descriptive 
subconstructs. Table 3 shows the original subcategor ization schema 
selected for the reading analysis, which was adapted from a much more 
detailed Index (see Fiege-Kol Imann , 1977). The single asterisks in 
Table 3 indicate the subcategories which became meaningful in the actual 
analysis of CTBS and SES reading Instruments at grades three and six 
(I.e., the subcategories which were actually used by the coders). Table h 
shows the same schema for mathematics, also adapted from a much larger 
index (see Buchanan, 1976)- Occasionally, constructs turn up on reading 
instruments which are not part of the original schema for a particular 
broad skill category, such as comprehension. If the instruments being 
equated are to be described In much detail, these constructs require the 
addition of separate subcategories to the schema. For this reason, any 
coding schema that Is used should be treated as an open-ended framework, 
a tool that can be refined as the need arises in the analysis procedure. 
The double asterisks In Table 3 are an example of subcat^egor ies that were 
added by coders. The literary constructs with double asterl^sks rarely 
appear under ''comprehension skills'' In most conventional Indexes* However, 
they did ^ppear In the CTBS test under comprehension, and they were 
included In the coding schema. 

One coder with background experience In reading Instruction, and one 
with background experience in mathematics instruction, was asked to code 
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Table 3^ Subcategories for Use in Reading Instrument Analysis 



VOCABULARY SUBCATEGORIES 

* Synonymns, given minimal or no context 

* Antonynms 

* Definitions 

* Words contextual ly cued (common, homonymns, homographs, multiple meaning) 

* Function words (prepositions, pronouns, proforms) 
^ Figurative language 

COMPREHENSION SUBCATEGORIES, 

* Facts, details 

^ Sequence of events 

* Cause-effect - ^ 
Main idea (topic, title) 

* Conclusions / 

* Predictjons/judgments 
Following dfrections 
Comparisons 
Contrasts 
Analogies 

^ Classification 

* Relevant versus irrelevant 
Figurative context/devices 
Quotations 

Poems/poetry elements 



^< Categories actually used by coders in the analysis 

Categories added by coders and actually used In the analysis 
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Table ki Subcategories for Use In Mathematics Instrument Analysis 



COMPUTATION 

L Addition and subtraction of basic facts 

Addition and subtraction of 2*dlglt numbers, no regrouping 

^'Addition and subtraction of 2^dlglt numbers, regrouping 

* Addition and subtraction of 3*dlglt numbers, no regrouping 
^ Addition and subtraction of S'^dlglt numbers, regrouping 

^ Addition and subtraction of large numbers 

^ Multiplication and division facts 

^ Mul tipl Icatlon by 1-diglt multipliers, no regrouping 

^ Multiplication by 1-dlgft multipliers, regrouping as necessary 

* Multiplication, large numbers, by 2-3-dlglt multipliers 

* Division by 1-2 digit divisors, no regrouping 
^ Division by multiple of ten 

^ Division by 1-dl^lt divisors, regrouping as necessary 

* Division, large numbers, by 2-3"dlglt divisors 

Addition and subtraction of fractions, like denominators, no regrouping. 
^ Addition and subtraction of fractions, like denominators, regrouping 
as necessary 

Addition and subtraction of fractions, unlike denominators, no 
regrouping 

Addition and subtraction of fractions, unlike denominators, regrouping 
as necessary 
^< Multiplication and division of fractions 
^ Addition and subtraction of decimals 

^ Multiplication and division of decimals by whole numbers, 10, 100 

Multiplication of decimals by decimals 
^ Division of decimals by decimals 
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^ Actually used In the analysis 



1 

This paper discusses only the computation sections of the two 
Instruments; and, therefore, only the computation portion of the 
mathematics Index I? provided here. See Buchanan, 1977> for the 
complete Index* 



17 



14 

grades three and six, using the specific subcategories shown in Tables 3 
and k. Coders were told to be consistent In classifying Items at both 
grade levels of both Instruments, The critical practice in this stage 
of the analysis Is to apply the categorization schema systematically to 
both instruments. The procedure Is most stable when a coder Is responsible 
for both Instruments In one grade level. 

Stage Three: Items Within Subcategories Which Are Matched 

The third, most specific level of analysis focuses on items In those 
subcategories which are nominally al \ke. In the example analysis, two Item 
features were identified as plausible, but not necessar i ly the only, indi- 
cators of item equatabil Ity: the sk\]] assessed and the Item difficulty 
(in percent correct). In dealing first with skills actually assessed. It 
seemed reasonable to assume that Items in the same subcategory attended 
to, more or less, the same siclll constructs. Item difficulty, on the 
other hand, required some additional considerations. The Initial task 
was to establish a range of Item difficulties In each subcategory. This 
was done by arraying difficulty values for each item In the matched 
subcategories from most dlffict^lt to least difficult. Table 5 gives an 
example of the array for reading comprehension at grade three. Using this 
type of table, items that fall strictly within overlapping difficulty 
ranges are analyzed first; then Items that are adjacent to this range of 
difficulty values are analyzed. It Is usually reasonable to stay within 
a range of plus or minus .25 from the extreme values on the strictly 
overlapping items since this scope is broad enough to permit a qualitative 
analysis of many Items* A reading specialist reviewed and rated the Items 
in both Instruments as "more or less similar,** or ^'rnore or less dissimilar/* 
A mathematics specialist completed the same activity for the mathematics 
instruments* Items were .reviewed on a number of features, such as language 



At this point, a methodological note is in order. Researchers often 
forget that their categorization schemas are constructed, not devlned. 
While some schemas may have more, o^ less, of a descriptive relationship 
to instructional materials than others, one would be hard-pressed to identify 
the "best," or, even more optimistically, the "right" set of constructs* 
Whatever set ts used, methodologically, the fundamental concern should be 
with systematic application across the two instruments being equated* 
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Table 5: Array of Item Difficulty in Percent of Right Response 
for CTBS and SES Reading listruments at Grade Three 



Comprehens ion 
Detai Is 



Comprehension 
Sequence 



Comprehens ion 
Conclusions 
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CTBS 

Item Ordered 
No. Difficulty 



11 
39 
9 



.32 
.35 



12 
18 
38 
37 
36 
26 

17 
10 
1 

29 
21 
28 
IT 



.55 
.57 
.57 
.59 
.61 
.62 

.63 
.64 

.65 
.65 
.66 
.67 



.71 



.7V 



T^O" 

21 
8 



.61 
.63 



32 
33 
19 
2k 

25 



M 
M 
.51 
.51 
.51 



13 
23 
22 
30 
31 
3 



.53 
.53 
.Sk 
.58 
.62 
.66 



.71 



.80 



SES 

Item Ordered 
No. Difficulty 



35 



.72 
.73 



33 



.79 



37 



36 
38 



.58 



.66 
.68 



38 



.75 
.77 



range +/' 
.25 



▲ overlapping 
difficulty 
ranges 



A over lapping 
difficulty 
ranges 



range +/• 
.25 



range +/■ 
.25 
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level t syntax, semantics, format, contextual clarity, response discrimina- 
tions, specificity of the subskill (e.g., regrouping with zeros), and so 
on. The intention was to be liberal in identifying items which might be 
thought similar, thus producing as large a pool of 'M Inked'* Items as 
possible. V \ 



Interpretation of Qualitative Equating Informatio n 

In the sample analysis, the similarities between standardized 
achievement tests and instructional accomplishment inventories decrease 
as the skills, and eventually the items, become more specific. At the 
most general analytical level, that is on the surface, the two instru- 
ments look somewhat alike. This first cut comparison, however, reveals j 
some interesting differences about the nature of the subject matters. \ 
For example, reading Is a subject matter where most of the specific 
technical "Veading** skills are pretty much taught in grades one, two, 
and early grade three. After that time, students do not learn to read, 
as much as they 'Vead to learn." That is, they apply their reading 
skills in order to read and understand longer, more sophisticated texts. 
Hence there is a small number of broad reading constructs represen^ted 
in both CTBS and SES at grades three and six, and the general constructs 
can be maintained grade-by-grade. The content of mathematics instruction 
is different* Through grade six, there are specific, technical mathematics 
skills that students are expected to learn. These kinds of technical 
skills actually increase in number in grades four and five, while appli- 
cations tend to have a low profile throughput the intermediate grades in 



most programs of instruction. Hence, there are many mathematics constructs 
represented in both CTBS and SES at grades three and' six. These general 
constructs do more changing grade-by-grade In mathematics than In reading, 
because there is less emphasis on process and more emphasis on specific, 
technical skills. 

A. the general instrument level of analysis, the querying categories 
of both instruments, although not the reporting categories, seem to focus 
on about the same general skills. In a standardized achievement test, 
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many skills tend to be collapsed into one skill category for reporting; 
whereas in an instructional accomplishment inventory, more homogeneous 
skills categories are reported separately whenever possible. 

Tables 1 and 2 give qualitative indicators that the statistical 
equating of the SES and CTBS for reading has a tenuous qualitative basis 
at best. More than half of the broad skill categories which are assessed 
and reported in the SES are not represented in the CTBS reading test, 
unlike^he mathematics test where all of the broad skill categories in 
the SES are collapsed under even broader CTBS categories. The SES in 
reading represents large chunks of instruction which simply are not 
reflected in CTBS reading. This means that more than half of the SES 
and CTBS items are not equatable in any way that reflects actual skills 
assessed. Specifically, 30 of the 53 SES reading items in grade three, 
and 31 of the 62 SES reading items in grade six, fall in general querying 
and reporting categories that have no counterpart in the CTBS reading 
test* The remaining appearance of equatability at the genera) instrument 
level is an illusion created by categories with the same names, but sub- • 
stantively different representation. The illusion became clear as the 
analysis moved to finer levels of specificity. 

,...Jm the subcategory level of analysis, the fit between CTBS and SES 
reading instruments nearly disappeared, and the fit between the mathematics 
instruments began to strain. Tables 6 and 7 compare the reading instru- 
ments at grades three and six, respectively. In the vocabulary test, the 
the tables f>how that CTBS assesses only one subskill (coded as synonyms), 
using a very large number of items with an identical format (test length 
being an important factor in obtaining maximum subtest reliability). SES, 
on the other hand, represents the broad scope and emphasis of vocabulary 
instruction at these grade levels by surveyihg several subcategories of 
vocabulary skills* This procedure provides Just two synonym items In SES 
that might possibly be equated to the ^0 synonym items in CTBS. A total 
of ten equatable reading items we're identified at grade three; and a total 
of seven equatable items were identified at grade six. There are no more 
i terns . 
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Table 6: Comparrson of CTBS and SES Subcategories in Vocabulary and 
Comprehension Skill Categories for Grade 3 



Categories 


CTBS SES 






Vocal^ulary 
Prepositions 
Synonyms 
context Clues 
Definitions 
Antonyms 


ho items 10 items 

X 

X X (2 items) 

y 

A 

X 
X 


Comprehension 
Detai Is 
Sequence 

Drawing conclusions 
Main idea 
Classification 


kS Items 13 items 

X X (3 Items) 
X X (3 items) 
X X '(2 items) 
X 

X 



Total number of subcategories: 5 9 

Total number of SES items 

in similar subcategories: (10) 



Table 7: Comparison of CTBS and SES Subcategories in Vocabulary and 
Comprehension Skill Categories for Grade Six 



Categories 


CTBS 


SES 






Vocabulary 


40 items 


12 items 


Definitions 




X 


Synonyms 


X 


X (2 items) 


Figurative language 




X 


Antonyms 




X 


Comprehension 


45 items 


19 items 


Drawing conclusions 


X 


X (1 item) 


Prediction 




X 


1 rrelevancy 




X 


Classification 




X 


Main Idea 


X 


X (1 Item) 


Details 


X 


X (2 items) 


Cause/effect 




X 


Sequence 


X 


X (1 item) 


Quotations 




X 


Poetry elements 


X 




Figurative context 


X 





Total number of subcategories: 

Total number of SES Items 
In similar subcategories: 

I 
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There Is little reason to carry the analysis of the reading 
instruments any further. Eyen if al 1 ten items at grade three and all 
seven items at grade six identified as equatable at the subcategory level 
were to prove equatable at the item level of qualitative analysis » it 
makes no sense to equate statistically seven or ten items of a 50 or 60 
Item instrvjment to 85 items o\\ another instrument* In fact, not all ten 
SES reading items at grade three, nor all seven at grade six, have a 
corresponding, item form i^n CTB\S reading. For example, Neither of the 
SES synonym tems at these gra^^e levels are qualitatively "similar" to 
CTBS items, not even where iteni difficulty values are similar. All hO 
of the CTBS synonym items have |the same performance format: A two- to 
four- word ambiguous phrase is tbe' "st imul us," with a target word that 
is frequently above grade level and with four response choices, including 
two or three acceptable, if not '"best," answers, SES synonym items are 
set intentionally in d i sambiguatll ng two- to four- sentence contexts, with 
three or four response choices, ^nd much more emphasis on "right" answers. 
Even using a liberal metric for Equating items, these kinds of performance 
formats are not similar. Most otten, performance scores won't be similar 
either. 



CTBS--Level 1, Form S 



SES— Grade Three 
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good idea 

O example 

O fact 

O mood 

O thought 



P«.53 



2h. 



John wants to buy that 
coat. He does not care 
about the price. 



Which word means the same 
as the underlined word? 



market 

O 



order 

O 



cost 

O 



The qualitative equating story for mathematics in the present study 
has about the same* ending as that for reading. It Just takes longer to 
tell, and the final break is not obvious until the analysis reaches the 
item level • However, there is an interesting twist in the tale when It 
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is told for mathematics, and so an abbreviated version^ of that story Is 



prov ded here* For this purpose, the analysis can focus on the broad 
area of computation. Table 8 shows the subcategory breakdown for the 
grade three instrument. 

The asterisks in Table 8 point out the interesting nature of 
standardized testing. In Table 8, the asterisks indicate that four of 
the nine subcategories assessed In CTBS for grade three represent skills 
that are welF beyond mai/iline grade three instruction. From a psycho- 
"metrlc perspective, this procedure makes sense. Instruction throughout 
the 'primary grades tends to be quite successful, i.e., many more students 
learn more of the skills that are taught about on schedule than they do 
In later grades. Therefore, performance scores on the mainline skills 
will have some tendency to cluster In the middle to high score ranges at 
grades one, two, and three, and a level of difficulty has to be added 
to the CTBS test to distribute scores In a downward direction. In 
reading, this is f/equently accomplished by manipulating the language « 
(e.g., a complex syntax may be used, vocabulary that is above grade 
level may be Included^ln items, etc.). With computation, it is diffi- 
cult to affect scores by manipulating the language, since most computation 
items are basically language free. What can be done to improve the 
discrimination power of the test is to Incorporate a' large number/ of items 
on skills that may be only Introduced, at the grade level but are taught 
and learned at a higher grade level. These kinds of skills occur fre- 
quently in mathematics instruction because of the linear characteristics 
of that instruction. For example, students are taught to add and subtract 
with regrouping on small numbers be^re they are taught to add and subtract 
with regrouping on large numbers. The former skill is expected to come on 
line by the end of grade three and the latter skill by the end of grade 
four. At the same time, most comprehensive programs for mathematics 
instruction will include a small number of lessons near the end of grade 
three to briefly introduce the skill. 
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Table 8 



Categories 



Comparison of CTBS and SES Subcategories 
in the Computation Skills Category 
for Grade Three 



CTBS 



SES 



Computation 
Addition and subtraction of 
numbers to 2-dtgits, re- 
grouping as necessary 
Addition and subtraction of 
numbers to 3"dlglts, no. 
regrouping 
Addition and subtraction of 
numbers to 3"dlglts, re- ' 
griDupIng as necessary 
^Addition and subtraction 

large numbers 
Multiplication and division facts 
Multiplication by l-dlglt 

multipliers, no regrouping 
Multiplication by 1-dIglt 
multipliers, regrouping as 
necessary 
^^Division by 1-dlglt divisors, 

no regrouping 
^'^Divislon by multiple of 10 



48 items 
X 
X 



X 
X 



X 
X 



]k items 

X (6 items) 
X (2 items) 



X (6 Items) 



Total number of subcategories in 
this skills category: 



Total number of SES items in 
similar subcategories: 



^Introduced late in grade three> retaught seriously in grade four, 
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The fit between standardized achievement tests and instructional 
accomplishment inventories seems to improve for mathematics in the 
intermediate grades, as illustrated in Table 9. Finally, at least for 
this large subtest area, there seems to be a fairly large number of 
equata|l)1e CTBS and SES items. This makes sense f rom* both a psychometric 
and an^ instructional perspective* By grade six, the skills involved in 
mathematics instruction have become much more difficult for most students. 
There/fore, items taken directly from instruction^ and practice have a 
natural tendency to have moderate difficulty levels, which is necessary 
from/ a psychometric perspective, so there is no need to ^'add'' any 
difficulty. To the contrary, since most students are going to cluster 
in the midd1e-to*low score ranges on skills that are taught in instruc* 
tion» a level of difficulty often has to be removed from the computation 

i 

subtest in order to spread scores out in an upward direction. This 
condition makes, for an interesting twist In the standardized test for 
mathematics In the intermediate grades. A large number of items are 
incorporated that represent skills which are mainly taught and learned 
at, lower grade levels . The asterisks in Table 9 indicate a number of 
subcategories in the CTBS test intended for use in grade six which 
represent instructional content that is somewhat below mainline grade 
si'x computation instruction. Thus, in the Intermediate grades, the poor 
fi;t In computation subcategories is due largely to CTBS Items that are 

bdlow grade level, a very different condition from grade three. 

I 

This ^'twist'* identifies just one area of poor fit between CTBS and 
SES mathematics instruments, and it Is mostly confined to computation 
skills. In the other skill areas, statistical equating becomes as 
suspect as it does for reading because of the number of SES items that 
are lost before and after the comparison process begins. By the time 
an i tern analysis Is extended to the entire 98- Items in the CTBS test 
and the 60 or so items In the SES Instrument, the of equatable 

items is very small in proportion to total test length In both grades 
three and six. 
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Table 9 

Comparison of CTBS and SES Subcategories 
in the Computation Skills Category 
for Grade Six 



Categories 


CTBS 


SES 
















Computation 


48 it^s 


21 items 


Addition and subtraction of 










numbers to 3~digitSy regrouping 


X 








Addition and subtraction of 










large numbers 


X 


X 


(5 


i tems) 


^'Multiplication and division facts 


X 








'^Multiplication by 1-digit 










multipliers, regrouping as 










necessary 


X 








Multiplication, large numbers by 










2-3 digit mull^ipl iers 


X 


Y 


(2 


t f^mC 1 


Division by 1-d?gJt divisors, 


> 








regr: >lng as necessary 


X 


X 


(2 


i f6mQ ) 


Division, large numbers by 2*3 










digit divisors 


X 


X 


(1 


i tem) 


Addition and subtraction of 










fractions, Wke denominators, 










regrouping as necessary 


X 


X 


(3 


i tems) 


Addition of fractions, unlike ^ 










denominators, regrouping as 










n<?cessary 


X 








Multiplication and division of 










fractions 


X 


X 


(2 


1 tems) 


Addition and subtraction of 










decimals 


X 


X 


(2 


items) 


Multiplication and division of 










decimals by whole number, 










10, 100 


X 


X 




1 tems) 


Division of decimals by 










dec imal s 


X 









Total number of subcategories 
in this genera] ski 1 Is 

category: 13 6 



Total number of SES Items 

in similar subcategories: (21) 

''^Reviewed early in grade six, but taught seriously In grades three, 
four> and five. 
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For (nathemaLics, unlike reading, much of the incompatibility betwee.v 
SES and CTQS becomes most apparent at the item level. For example, in the 
primary grades, CTBS items, uniike SES items, tend to focus on the moderate 
to difficult nuances of skills (e.g., regrouping with zeros, or regrouping 
in twr "^laces, etc.). The simple nuances, which often receive the heaviest 
emphasis in instruction, tend to be represented with a few Items in CTBS. 
A typical example of item incompatibility is shown below, and It demon- 
strates this focus on different nuances of a skill. 



CTBS--Level 1 , Form S 



SES--Grade 3 



Mr. Smith washed his car. The 
two clocks show you when he 
started and when he finished. 
At what time did"he finish? 



START 



FINISH 





P = .60 



O e:ko 

O 7:30 

O 8:00 

O 8:30 



Mark the time. 




.80 



O 6:10 
O 10:30 
O 10: Oo 



These measurement I terns belong nominally to the same subcategory--t ime. 
But the items, as well as the performance scores (P values), are clearly 
not comparable. 

General Applicability of Qualitative Equating 

This paper has presented a method for qual Itative er jatlng, a matter 
which has not been studied seriously until now. The presentation has 
focused on instructional accomplishment inventories and standardized 
achievement tests. However, the method has wider applicability to all 
varieties of instruments. Traditionally, test developers have described 
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the statistical relationships between instruments^ but they almost never 
describe the meanlngf ulness of the relationship. By preceding quantita- 
tive analysis with a qualitative analysis, researchers can provide a 
logical foundation for equating two instruments. In some instances, the 
results of the qualitative analysis will reveal that there is no meaning- 
ful basis for conducting statistical equating, eliminating the necessity 
for a statistical analysis. In other applications, the qualitative results 
will support a quantitative equating operation. 

In any case, the method presented should refine the paradigm for 
quantitatively equating testing instruments. Researchers now have another 
question to ask before proceeding with statistical operations: "How 
extensive is the qualitative basis for equating the specific instruments?'' 
It is unscientific to assume that the answer will always support quantita- 
tive procedures for equating instruments. On the other hand, where it is 
shown that testing instruments do have a qualitative relationship, the 
statistical relationship between the instruments takes on a better informed 
meaning. 
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